Lambda Phage analysis to determine conversion efficiency

Read data location 
/Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R1.fastq.gz  
/Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R2.fastq.gz  



code
./bsmap -a //Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R1.fastq.gz -b  /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R2.fastq.gz -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -o /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda.sam -p 2 -n 1

*Note*
-n  [0,1]   set mapping strand information:
            -n 0: only map to 2 forward strands, i.e. BSW(++) and BSC(-+)    (i.e. the "Lister protocol")
            for PE sequencing, map read#1 to ++ and -+, read#2 to +- and --.
            -n 1: map SE or PE reads to all 4 strands, i.e. ++, +-, -+, --    (i.e. the "Cokus protocol")
            default: -n 0. Most bisulfite sequencing data is generated only from forward strands.



Input reference file: /Volumes/web/cnidarian/phage_lambda_genome.fasta      (format: FASTA)
Load in 1 db seqs, total size 48502 bp. 0 secs passed
total_kmers: 43046721
Create seed table. 2 secs passed
max number of mismatches: read_length * 8%      max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100     max Ns: 5     seed size: 16     index interval: 4
quality cutoff: 0     base quality char: '!'
min fragment size:28     max fragemt size:500
start from read #1     end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand (read_1): ++,-+,+-,--
mapping strand (read_2): +-,--,++,-+
Pair-end alignment(2 threads)
Input read file #1: //Volumes/NGS Drive/NGS Raw Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R1.fastq.gz      (format: gzipped FASTQ)
Input read file #2: /Volumes/NGS Drive/NGS Raw Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R2.fastq.gz      (format: gzipped FASTQ)
Output file: /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda.sam      (format: SAM)

Total number of aligned reads:
pairs:       189393 (0.11%)
single a:    7462 (0.0043%)
single b:    7845 (0.0046%)
Done.


Run SAM through methratio script

python methratio.py -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -u -z -g -o  /Volumes/web/cnidarian/BiGO_methratio_lambda_B.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda.sam



Options:
  -h, --help            show this help message and exit
  -o FILE, --out=FILE   output file name. (required)
  -d FILE, --ref=FILE   reference genome fasta file. (required)
  -c CHR, --chr=CHR     process only specified chromosomes, separated by ','.
                        [default: all] example: --chroms=chr1,chr2
  -s PATH, --sam-path=PATH
                        path to samtools. [default: none]
  -u, --unique          process only unique mappings/pairs.
  -p, --pair            process only properly paired mappings.
  -z, --zero-meth       report loci with zero methylation ratios.
  -q, --quiet           don't print progress on stderr.
  -r, --remove-duplicate
                        remove duplicated reads.
  -t N, --trim-fillin=N
                        trim N end-repairing fill-in nucleotides. [default: 2]
  -g, --combine-CpG     combine CpG methylaion ratios on both strands.
  -m FOLD, --min-depth=FOLD
                        report loci with sequencing depth>=FOLD. [default: 1]
  -n, --no-header       don't print a header line
  -i CT_SNP, --ct-snp=CT_SNP
                        how to handle CT SNP ("no-action", "correct", "skip"),
                        default: "correct".


total 394093 valid mappings, 21067 covered cytosines, average coverage: 272.88 fold.

http://eagle.fish.washington.edu/cnidarian/BiGO_methratio_lambda_B.txt 







python methratio.py -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -u -g -p -o  /Volumes/web/cnidarian/BiGO_methratio_lambda_C.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda.sam

total 378786 valid mappings, 21062 covered cytosines, average coverage: 259.85 fold.



 



python methratio.py -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -u -p -o  /Volumes/web/cnidarian/BiGO_methratio_lambda_D.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda.sam

total 378786 valid mappings, 24173 covered cytosines, average coverage: 226.40 fold. 
















New BSMAP with alternative -n

code
./bsmap -a //Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R1.fastq.gz -b  /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R2.fastq.gz -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -o /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda_v2.sam -p 3


Input reference file: /Volumes/web/cnidarian/phage_lambda_genome.fasta      (format: FASTA)
Load in 1 db seqs, total size 48502 bp. 0 secs passed
total_kmers: 43046721
Create seed table. 2 secs passed
max number of mismatches: read_length * 8%      max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100     max Ns: 5     seed size: 16     index interval: 4
quality cutoff: 0     base quality char: '!'
min fragment size:28     max fragemt size:500
start from read #1     end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand (read_1): ++,-+
mapping strand (read_2): +-,--
Pair-end alignment(3 threads)
Input read file #1: //Volumes/NGS Drive/NGS Raw Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R1.fastq.gz      (format: gzipped FASTQ)
Input read file #2: /Volumes/NGS Drive/NGS Raw Data/Oyster_gonad_Bisulfite/full/filtered_174gm_A_NoIndex_L006_R2.fastq.gz      (format: gzipped FASTQ)
Output file: /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda_v2.sam      (format: SAM)



Total number of aligned reads:
pairs:       189394 (0.11%)
single a:    7460 (0.0043%)
single b:    7844 (0.0046%)


*Note*
Changing -n value does not significantly alter number of aligned reads.




python methratio.py -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -u -g -p -o  /Volumes/web/cnidarian/BiGO_methratio_lambda_M.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda_v2.sam


------

Run methratio in order to determine conversion efficiency

python methratio.py -d /Volumes/web/cnidarian/phage_lambda_genome.fasta -u -z -g -p -o  /Volumes/web/cnidarian/BiGO_methratio_lambda_N.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda_v2.sam


@ Tue Apr 16 10:26:09 2013: reading reference /Volumes/web/cnidarian/phage_lambda_genome.fasta ...
@ Tue Apr 16 10:26:09 2013: reading /Volumes/web/cnidarian/BiGO_BSMAP_Gonad_lambda_v2.sam ...
[samopen] SAM header is present: 1 sequences.
@ Tue Apr 16 10:26:33 2013: combining CpG methylation from both strands ...
@ Tue Apr 16 10:26:33 2013: writing /Volumes/web/cnidarian/BiGO_methratio_lambda_N.txt ...
@ Tue Apr 16 10:26:33 2013: done.
total 378776 valid mappings, 21062 covered cytosines, average coverage: 259.84 fold.

http://eagle.fish.washington.edu/cnidarian/BiGO_methratio_lambda_N.txt